Skip to content

[PERF IMPROVEMENT] Add amdgpu-waves-per-eu fn_attrs to kernels.#39

Merged
yaoliu13 merged 1 commit intoamd-integrationfrom
perf/kejoseph/configure-waves-per-eu
May 2, 2026
Merged

[PERF IMPROVEMENT] Add amdgpu-waves-per-eu fn_attrs to kernels.#39
yaoliu13 merged 1 commit intoamd-integrationfrom
perf/kejoseph/configure-waves-per-eu

Conversation

@kevinjosephamd
Copy link
Copy Markdown

@kevinjosephamd kevinjosephamd commented Apr 27, 2026

  • Depends on feat(amdgpu): per-kernel LLVM function attributes via @qd.kernel(fn_attrs=...) quadrants#11 (per-kernel fn_attrs support).
  • Values derived from per-kernel occupancy sweeps, balancing occupancy against register spill pressure by running a sample workload on combinations of min and max using values from 1 to 5 for both parameters.
  • From a portability perspective, this is not a sustainable long-term solution. These specific values are tuned for the current hardware and likely will not translate effectively to different AMD GPU architectures or classes.
Kernel (min, max)
kernel_step_1 (3, 4)
kernel_step_2 (1, 4)
func_solve_init (2, 4)

@kevinjosephamd kevinjosephamd force-pushed the perf/kejoseph/configure-waves-per-eu branch from 4811714 to c1643cb Compare April 27, 2026 05:27
@kevinjosephamd kevinjosephamd changed the title Add amdgpu-waves-per-eu fn_attrs to 4 hot rigid/constraint kernels. [PERF IMPROVEMENT] Add amdgpu-waves-per-eu fn_attrs to 4 hot rigid/constraint kernels. Apr 27, 2026
@kevinjosephamd kevinjosephamd changed the title [PERF IMPROVEMENT] Add amdgpu-waves-per-eu fn_attrs to 4 hot rigid/constraint kernels. [PERF IMPROVEMENT] Add amdgpu-waves-per-eu fn_attrs to 4 rigid/constraint kernels. Apr 27, 2026
@yaoliu13
Copy link
Copy Markdown
Collaborator

/run-ci

@yaoliu13
Copy link
Copy Markdown
Collaborator

/run-ci

@kevinjosephamd kevinjosephamd force-pushed the perf/kejoseph/configure-waves-per-eu branch 4 times, most recently from d352648 to cb415d1 Compare April 30, 2026 23:35
@kevinjosephamd kevinjosephamd changed the title [PERF IMPROVEMENT] Add amdgpu-waves-per-eu fn_attrs to 4 rigid/constraint kernels. [PERF IMPROVEMENT] Add amdgpu-waves-per-eu fn_attrs to kernels. May 1, 2026
Depends on ROCm/quadrants#11 (per-kernel fn_attrs support).
Values picked from a per-kernel sweep of (min,max) occupancy hints:
  kernel_step_1:               3,4
  kernel_step_2:               1,4
  func_solve_init:             2,4
@kevinjosephamd kevinjosephamd force-pushed the perf/kejoseph/configure-waves-per-eu branch from cb415d1 to a89127c Compare May 1, 2026 14:57
@kevinjosephamd
Copy link
Copy Markdown
Author

/run-ci

@kevinjosephamd kevinjosephamd requested a review from rtmadduri May 1, 2026 15:39
Copy link
Copy Markdown

@jamesETsmith jamesETsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinjosephamd can you create an issue to follow up on this in the future to make sure it's not hurting performance on different workloads (like you mention in the PR description)?

Copy link
Copy Markdown

@gpinkert gpinkert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work

@yaoliu13
Copy link
Copy Markdown
Collaborator

yaoliu13 commented May 1, 2026

1355088 and 5044

Copy link
Copy Markdown
Collaborator

@yaoliu13 yaoliu13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yaoliu13
Copy link
Copy Markdown
Collaborator

yaoliu13 commented May 1, 2026

Waiting for pre-submit of #60 and ROCm/quadrants#15

@yaoliu13
Copy link
Copy Markdown
Collaborator

yaoliu13 commented May 2, 2026

1355088 and 5044

@yaoliu13 yaoliu13 merged commit dd02a34 into amd-integration May 2, 2026
@yaoliu13 yaoliu13 deleted the perf/kejoseph/configure-waves-per-eu branch May 2, 2026 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants